{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pytorch的数据读取\n",
    "Pytorch的数据读取非常方便, 可以很容易地实现多线程数据预读. 我个人认为编程难度比TF小很多，而且灵活性也更高.\n",
    "\n",
    "Pytorch的数据读取主要包含三个类:\n",
    "\n",
    "1. Dataset\n",
    "2. DataLoader\n",
    "3. DataLoaderIter\n",
    "\n",
    "这三者大致是一个依次封装的关系: 1.被装进2., 2.被装进3.\n",
    "\n",
    "DataLoader本质上就是一个iterable（跟python的内置类型list等一样），并利用多进程来加速batch data的处理，使用yield来使用有限的内存\n",
    "\n",
    "① 创建一个 Dataset 对象\n",
    "② 创建一个 DataLoader 对象\n",
    "③ 循环这个 DataLoader 对象，将img, label加载到模型中进行训练\n",
    "\n",
    "DataLoader 创建 Iter， 调用 next()\n",
    "\n",
    "# torch.utils.data\n",
    "\n",
    "## Dataset\n",
    "\n",
    "表示Dataset的抽象类。所有其他数据集都应该进行子类化。 所有子类应该override `__len__` 和`__getitem__`，前者提供了数据集的大小，后者支持整数索引，范围从0到len(self)\n",
    "\n",
    "```python \n",
    "class Dataset(object):\n",
    "\t# 强制所有的子类override getitem和len两个函数，否则就抛出错误；\n",
    "\t# 输入数据索引，输出为索引指向的数据以及标签；\n",
    "\tdef __getitem__(self, index):\n",
    "\t\traise NotImplementedError\n",
    "\t\n",
    "\t# 输出数据的长度\n",
    "\tdef __len__(self):\n",
    "\t\traise NotImplementedError\n",
    "\t\t\n",
    "\tdef __add__(self, other):\n",
    "\t\treturn ConcatDataset([self, other])\n",
    "    ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subset\n",
    "\n",
    "`class torch.utils.data.Subset(dataset, indices)`\n",
    "\n",
    "选取特殊索引下的数据子集； dataset：数据集； indices：想要选取的数据的索引；\n",
    "\n",
    "### random_split\n",
    "\n",
    "`class torch.utils.data.random_split(dataset, lengths):`\n",
    "随机不重复分割数据集； dataset：要被分割的数据集 lengths：长度列表，e.g. [7, 3]， 保证7+3=len(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 划分 Teat Train Valid\n",
    "\n",
    "```python\n",
    "import torch\n",
    "from torch.utils import data\n",
    "import random\n",
    "\n",
    "master = data.Dataset( ... )  # your \"master\" dataset\n",
    "n = len(master)  # how many total elements you have\n",
    "n_test = int( n * .05 )  # number of test/val elements\n",
    "n_train = n - 2 * n_test\n",
    "idx = list(range(n))  # indices to all elements\n",
    "random.shuffle(idx)  # in-place shuffle the indices to facilitate random splitting\n",
    "train_idx = idx[:n_train]\n",
    "val_idx = idx[n_train:(n_train + n_test)]\n",
    "test_idx = idx[(n_train + n_test):]\n",
    "\n",
    "train_set = data.Subset(master, train_idx)\n",
    "val_set = data.Subset(master, val_idx)\n",
    "test_set = data.Subset(master, test_idx)\n",
    "```\n",
    "\n",
    "This can also be achieved using data.random_split:\n",
    "\n",
    "```python\n",
    "train_set, val_set, test_set = data.random_split(master, (n_train, n_val, n_test))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append(\"..\") \n",
    "import dl_utils\n",
    "import torch\n",
    "import torchvision\n",
    "train_iter, test_iter = dl_utils.load_data_fashion_mnist(batch_size = 256)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mnist_train  len Dataset FashionMNIST\n",
      "    Number of datapoints: 60000\n",
      "    Root location: C:\\Users\\Jarvis/Datasets/FashionMNIST\n",
      "    Split: Train\n",
      "mnist_test  len Dataset FashionMNIST\n",
      "    Number of datapoints: 10000\n",
      "    Root location: C:\\Users\\Jarvis/Datasets/FashionMNIST\n",
      "    Split: Test\n",
      "48000\n",
      "12000\n"
     ]
    }
   ],
   "source": [
    "#将训练数据 分为k份，其中k-1 为训练集 ，k为验证集\n",
    "root='~/Datasets/FashionMNIST'\n",
    "\n",
    "trans = []\n",
    "trans.append(torchvision.transforms.ToTensor())    \n",
    "transform = torchvision.transforms.Compose(trans)\n",
    "\n",
    "mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)\n",
    "mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)\n",
    "\n",
    "print('mnist_train  len {}'.format(mnist_train))\n",
    "\n",
    "print('mnist_test  len {}'.format(mnist_test))\n",
    "\n",
    "def validation_dataset(train_iter, k):\n",
    "    import random\n",
    "    n = len(train_iter)\n",
    "    idx = list(range(n))  # indices to all elements\n",
    "    random.shuffle(idx)  # in-place shuffle the indices to facilitate random splitting\n",
    "    #形成一个对训练集idx 的 随机排序\n",
    "    vail_index = idx[0 : int(n/k)]\n",
    "    train_index = idx[int(n/k) : ]\n",
    "    val_set = torch.utils.data.Subset(train_iter, vail_index)\n",
    "    train_set = torch.utils.data.Subset(train_iter, train_index)\n",
    "    return train_set, val_set\n",
    "\n",
    "train_set, val_set = validation_dataset(mnist_train, 5)\n",
    "\n",
    "print(train_set.__len__())\n",
    "print(val_set.__len__())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "48000\n",
      "12000\n"
     ]
    }
   ],
   "source": [
    "def random_split_validation(train_data, k):\n",
    "    n = len(train_data)\n",
    "    n_val= int (n / k )\n",
    "    n_train = n - n_val\n",
    "    return torch.utils.data.random_split(train_data, (n_train, n_val))\n",
    "    \n",
    "train_set2, val_set2 = random_split_validation(mnist_train, 5)\n",
    "print(len(train_set2))\n",
    "print(len(val_set2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "908\n",
      "D:\\PointNet\\PointNet-PyTorch-master\\data\\ModelNet10_\\bathtub\\test\\bathtub_0107.txt\n",
      "RangeIndex(start=0, stop=908, step=1)\n",
      "Index(['Path', ' Class'], dtype='object')\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "data = pd.read_csv('../test_files.csv') \n",
    "print(len(data))\n",
    "print(data.iloc[0,0])\n",
    "print(data.index)\n",
    "print(data.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 举个例子\n",
    "## 在Data/toy-points-dataset 中有三个类别的数据\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'bathtub': 0, 'bed': 1, 'chair': 2, 'desk': 3, 'dresser': 4, 'monitor': 5, 'night_stand': 6, 'sofa': 7, 'table': 8, 'toilet': 9}\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "data_set_path = 'D:\\\\PointNet\\\\PointNet-PyTorch-master\\\\data\\\\ModelNet10_'\n",
    "name = os.listdir(data_set_path)\n",
    "num = list(range(0, len(name)))\n",
    "class_num_dict = dict(zip(name, num))\n",
    "print(class_num_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['D:\\\\PointNet\\\\PointNet-PyTorch-master\\\\data\\\\ModelNet10_\\\\bathtub\\\\test\\\\bathtub_0107.txt', 'bathtub']\n"
     ]
    }
   ],
   "source": [
    "from torch.utils.data.dataset import Dataset\n",
    "import os\n",
    "#class_num_dict = {'bed':0, 'sofa':1, 'desk':2, 'bathhub':3}\n",
    "def get_path_piar(data_set_path, is_train = True):\n",
    "    class_names = os.listdir(data_set_path)\n",
    "    num = list(range(0, len(class_names)))\n",
    "    class_num_dict = dict(zip(class_names, num))\n",
    "    all_files = [['Path',' Class']]\n",
    "    for index, class_name in enumerate(class_names):\n",
    "        if(is_train): \n",
    "            file_path = os.path.join(data_set_path,class_name, 'train')\n",
    "        else:\n",
    "            file_path = os.path.join(data_set_path,class_name, 'test')\n",
    "        files = os.listdir(file_path)\n",
    "        \n",
    "        #files_path = [[os.path.join(os.getcwd(),file_path,file), class_name] for file in files ]\n",
    "        files_path = [[os.path.join(os.getcwd(),file_path,file), class_name] for file in files ]\n",
    "        all_files += (files_path)\n",
    "        #list append 与 + 操作不一样\n",
    "    return all_files\n",
    "\n",
    "modelnet10_path = 'D:\\\\PointNet\\\\PointNet-PyTorch-master\\\\data\\\\ModelNet10_'\n",
    "train_files_path = get_path_piar(modelnet10_path, is_train=True) \n",
    "test_files_path = get_path_piar(modelnet10_path, is_train=False) \n",
    "print(test_files_path[1])\n",
    "\n",
    "with open('../train_files.csv','w') as f:\n",
    "    for path in train_files_path:\n",
    "        f.write(path[0] + ',' + path[1] + '\\n')\n",
    "\n",
    "with open('../test_files.csv','w') as f:\n",
    "    for path in test_files_path:\n",
    "        f.write(path[0] + ',' + path[1] + '\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 假设我们已经有了训练集与测试集 文件的路径文件\n",
    "\n",
    "\n",
    "参考资料：\n",
    "\n",
    "https://www.pytorchtutorial.com/pytorch-custom-dataset-examples/\n",
    "\n",
    "https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html\n",
    "\n",
    "https://likewind.top/2019/02/01/Pytorch-dataprocess/\n",
    "\n",
    "https://pytorch.org/tutorials/beginner/data_loading_tutorial.html\n",
    "\n",
    "https://www.jianshu.com/p/8ea7fba72673\n",
    "\n",
    "\n",
    "\n",
    "形如一个csv或者txt，存放有：\n",
    "['D:\\\\OneDrive\\\\Desktop\\\\2Learning-Pytorch-Geometric\\\\Learning_Pytorch\\\\Data\\\\toy-points-dataset\\\\bed\\\\test\\\\bed_0533.points', 'bed']\n",
    "\n",
    "getitem 函数 \n",
    "\n",
    "根据 index， 获取一个Path，然后根据path， 读取数据\n",
    "例如 f.read, cv.imread 等\n",
    "\n",
    "然后返回 数据 和 label \n",
    "\n",
    "__len__ 函数一定要写，返回数据集的大小"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from torchvision import transforms\n",
    "from torch.utils.data.dataset import Dataset\n",
    "import numpy as np\n",
    "import torch\n",
    "\n",
    "class ReadDataFromFloder(Dataset):\n",
    "    def __init__(self, data_set_path, is_train = True):\n",
    "        self.data_path = pd.read_csv(data_set_path) \n",
    "        \n",
    "        #写一些transforms操作,对于不同阶段，可能不同，例如train 时候会加入一些噪声，或者旋转等\n",
    "        self.transformations  = {'train': transforms.Compose([transforms.ToTensor()  ]),\n",
    "                                 \n",
    "                                'test': transforms.Compose([transforms.ToTensor()    ])}\n",
    "        self.is_train = is_train\n",
    "     #这个函数根据数据的类型是变化的，因为不同类型的数据，读取为tensor的操作也不同。   \n",
    "    def txt_PointsCloud_parser(self, path_to_off_file):\n",
    "        # Read the OFF file\n",
    "        with open(path_to_off_file, 'r') as f:\n",
    "            contents = f.readlines()\n",
    "        num_vertices = len(contents)\n",
    "        # print(num_vertices)\n",
    "        # Convert all the vertex lines to a list of lists\n",
    "        vertex_list = [list(map(float, contents[i].strip().split(' '))) for i in list(range(0, num_vertices))]\n",
    "        # Return the vertices as a 3 x N numpy array\n",
    "        return np.array(vertex_list)\n",
    "        #return torch.tensor(vertex_list)\n",
    "        \n",
    "    def augment_data(self, vertices):\n",
    "        # Random rotation about the Y-axis\n",
    "        theta = 2 * np.pi * np.random.rand(1)\n",
    "        Ry = np.array([[np.cos(theta), 0, np.sin(theta)],\n",
    "                       [0, 1, 0],\n",
    "                       [-np.sin(theta), 0, np.cos(theta)]], dtype=np.float)\n",
    "        # print(Ry)\n",
    "        vertices = np.dot(vertices, Ry)\n",
    "        # Add Gaussian noise with standard deviation of 0.2\n",
    "\n",
    "        vertices += np.random.normal(scale=0.02, size=vertices.shape)\n",
    "        return vertices\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        # stuff\n",
    "        \n",
    "        #根据index 拿到 对应的文件路径\n",
    "        path =  self.data_path.iloc[index , 0]\n",
    "        \n",
    "        # 从路径 读取数据 这个函数可以优化，例如用h5文件格式\n",
    "        data = self.txt_PointsCloud_parser(path)\n",
    "        \n",
    "        #返回值应该是一个tensor 才能被网络consume, \n",
    "        #所以手动转tensor 或者 transform\n",
    "        \n",
    "        if  self.is_train :\n",
    "\n",
    "            data = self.augment_data(data)\n",
    "            data = self.transformations['train'](data)\n",
    "\n",
    "            \n",
    "        else:\n",
    "            data = self.transformations['test'](data)\n",
    "            \n",
    "        label = self.data_path.iloc[index , 1]\n",
    "       \n",
    "        return torch.squeeze(data) , label\n",
    " \n",
    "    def __len__(self):\n",
    "        return len(self.data_path)\n",
    "    \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "167\n",
      "use 28.844467 s\n"
     ]
    }
   ],
   "source": [
    "point_data_set = ReadDataFromFloder('train_files.csv')\n",
    "point_data_loader = torch.utils.data.DataLoader(point_data_set, batch_size = 24, shuffle=False)\n",
    "import time\n",
    "start = time.time()\n",
    "i = 0\n",
    "for data, label in point_data_loader:\n",
    "    i+=1\n",
    "print(i)\n",
    "print('use %f s'% (time.time() - start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# H5Py 格式\n",
    "\n",
    "https://geektutu.com/post/tensorflow-make-npy-hdf5-data-set.html#%E5%88%B6%E4%BD%9CHDF5%E6%A0%BC%E5%BC%8F%E7%9A%84%E6%95%B0%E6%8D%AE%E9%9B%86\n",
    "\n",
    "https://www.neusncp.com/user/blog?id=97\n",
    "\n",
    "https://zhuanlan.zhihu.com/p/34405536\n",
    "\n",
    "https://github.com/Lyken17/Efficient-PyTorch\n",
    "\n",
    "https://www.cnblogs.com/nwpuxuezha/p/7846751.html\n",
    "\n",
    "https://towardsdatascience.com/hdf5-datasets-for-pytorch-631ff1d750f5\n",
    "\n",
    "从上面看，遍历一遍dataloader比较慢，单线程 时间需要28.5s，如果我们将数据集先遍历一遍，batch_size = 1\n",
    "然后将每个数据与标签，加入到一个list\n",
    "最后用一个h5py文件保存，让数据集全部数据存在一个.h5文件数据库中。\n",
    "\n",
    "\n",
    "## 读取h5py\n",
    "\n",
    "1. 读取之后，然后用 H5Dataset 通过tensor构造数据集，由于h5py返回numpy数组，所以在dataset构造函数里判断是否为numpy array,如果是先转换为tensor\n",
    "\n",
    "通过这样构造的Dataset，遍历一遍只要0.004s， 200倍。\n",
    "\n",
    "多线程： 普通文件读取需要："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "195.67671232876714"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "28.5688/0.146\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "import h5py\n",
    "if f:\n",
    "    f.close()\n",
    "\n",
    "feature_num = 3\n",
    "data_h5py, label_h5py = [], []\n",
    "\n",
    "#对于数据集进行遍历，然后加入list，存入disk\n",
    "point_data_loader = torch.utils.data.DataLoader(point_data_set, batch_size = 1, shuffle=False)\n",
    "for data, label in point_data_loader:\n",
    "    data_h5py.append((torch.squeeze(data)).numpy())\n",
    "    label_h5py.append(class_num_dict[label[0]])\n",
    "    # label_one_hot = [0 if i != class_num_dict[label[0]] else 1 for i in range(feature_num24)] \n",
    "    \n",
    "with h5py.File('data.h5','w') as f:\n",
    "    f.create_dataset('points', data = data_h5py )\n",
    "    f.create_dataset('label', data =  label_h5py)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "with h5py.File('data.h5','w') as f:\n",
    "    f.create_dataset('points', data = data_h5py )\n",
    "    f.create_dataset('label', data =  label_h5py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(3991, 2000, 3) (3991,)\n"
     ]
    }
   ],
   "source": [
    "with h5py.File('data.h5', 'r') as f:\n",
    "    x, y = f['points'][()], f['label'][()]\n",
    "    \n",
    "print(x.shape, y.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "class H5Dataset(Dataset):\n",
    "    \"\"\"Dataset wrapping data and target tensors.\n",
    "\n",
    "    Each sample will be retrieved by indexing both tensors along the first\n",
    "    dimension.\n",
    "\n",
    "    Arguments:\n",
    "        data_tensor (Tensor): contains sample data.\n",
    "        target_tensor (Tensor): contains sample targets (labels).\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, data_tensor, target_tensor):\n",
    "        assert data_tensor.shape[0] == target_tensor.shape[0]\n",
    "        if isinstance(x, np.ndarray):\n",
    "            \n",
    "            self.data_tensor = torch.from_numpy(data_tensor)\n",
    "            self.target_tensor = torch.from_numpy(target_tensor)\n",
    "            \n",
    "        else:\n",
    "            self.data_tensor = data_tensor\n",
    "            self.target_tensor = target_tensor\n",
    "\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        # print(index)\n",
    "        return self.data_tensor[index], self.target_tensor[index]\n",
    "\n",
    "    def __len__(self):\n",
    "        return self.data_tensor.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "h5_point_set = H5Dataset(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## h5py文件 不支持num_worker > 0 ，不支持多线程读取，要改多线程，很麻烦。\n",
    "感谢来自评论区@Tio 同学的分享：\n",
    "\n",
    "背景：我们知道Torch框架需要符合其自身规格的输入数据的格式，在图像识别中用到的是以.t7扩展名的文件类型，同时也有h5格式类型，这种类型的和t7差不多，均可被torch框架使用，但在读入时候有个官方BUG\n",
    "问题：DataLoader, when num_worker >0, there is bug 读入.h5 数据格式时候如果dataloader>0 内存会占满，并报错\n",
    "问题解决："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "167\n",
      "0.1466062068939209\n"
     ]
    }
   ],
   "source": [
    "point_data_loader2 = torch.utils.data.DataLoader(h5_point_set, batch_size = 24, shuffle=False,num_workers=0)\n",
    "start = time.time()\n",
    "i = 0\n",
    "X = torch.zeros(2000,3)\n",
    "for x,y in point_data_loader2:\n",
    "    i+=1\n",
    "print(i)\n",
    "print(time.time() - start)\n",
    "\n",
    "def draw_Point_Cloud(Points, Lables, axis = True, **kags):\n",
    "    import matplotlib.pyplot as plt\n",
    "    from mpl_toolkits.mplot3d import Axes3D\n",
    "    x_axis = Points[:,0]\n",
    "    y_axis = Points[:,1]\n",
    "    z_axis = Points[:,2]\n",
    "    fig = plt.figure() \n",
    "    ax = Axes3D(fig) \n",
    "\n",
    "    ax.scatter(x_axis, y_axis, z_axis, c = Lables)\n",
    "    # 设置坐标轴显示以及旋转角度\n",
    "    ax.set_xlabel('x') \n",
    "    ax.set_ylabel('y')\n",
    "    ax.set_zlabel('z')\n",
    "    ax.view_init(elev=10,azim=235)\n",
    "    if not axis:\n",
    "        #关闭显示坐标轴\n",
    "        plt.axis('off')\n",
    "    \n",
    "    plt.show()\n",
    "#draw_Point_Cloud(X.numpy(), Lables=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3326\n",
      "665\n"
     ]
    }
   ],
   "source": [
    "#划分训练集 ，验证集，h5数据集，同样ok\n",
    "train_set2, val_set2 = random_split_validation(h5_point_set, 6)\n",
    "print(len(train_set2))\n",
    "print(len(val_set2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "if f:\n",
    "    f.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
      "Requirement already satisfied: h5py in c:\\users\\jarvis\\appdata\\roaming\\python\\python36\\site-packages (2.9.0)\n",
      "Requirement already satisfied: six in c:\\users\\jarvis\\appdata\\roaming\\python\\python36\\site-packages (from h5py) (1.12.0)\n",
      "Requirement already satisfied: numpy>=1.7 in d:\\python36\\lib\\site-packages (from h5py) (1.16.4)\n"
     ]
    }
   ],
   "source": [
    "!pip install h5py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
